Yeast de novo genes preferentially emerge from divergently transcribed, GC-rich intergenic regions
نویسندگان
چکیده
New genes, with novel protein functions, can evolve “from scratch” out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo genes in 15 yeast species from two genera whose phylogeny spans at least 100 million years of evolution. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We validated 82 candidates, by providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from non-coding to coding for 30 Saccharomyces de novo genes. We found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. We found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination would be a major driving force of de novo gene emergence in yeasts.
منابع مشابه
Assessing the influence of adjacent gene orientation on the evolution of gene upstream regions in Arabidopsis thaliana.
The orientation of flanking genes may influence the evolution of intergenic regions in which cis-regulatory elements are likely to be located: divergently transcribed genes share their 5' regions, resulting either in smaller "private" spaces or in overlapping regulatory elements. Thus, upstream sequences of divergently transcribed genes (bi-directional upstream regions, or URs) may be more cons...
متن کاملThe consequences of base pair composition biases for regulatory network organization in prokaryotes.
Given the dramatic variation in guanine-cytosine (GC) content observed in prokaryotes, from approximately 20% to approximately 75% GC, one wonders if these extreme biases in base pair composition affect the evolution of transcription factor-binding sites (BS). This letter shows that, along the wide range of GC content variation in bacteria, bacterial BS keep a high frequency of AT bases, roughl...
متن کاملTy1 insertions in intergenic regions of the genome of Saccharomyces cerevisiae transcribed by RNA polymerase III have no detectable selective effect.
The retrotransposon Ty1 of Saccharomyces cerevisiae inserts preferentially into intergenic regions in the vicinity of RNA polymerase III-transcribed genes. It has been suggested that this preference has evolved to minimize the deleterious effects of element transposition on the host genome, and thus to favor their evolutionary survival. This presupposes that such insertions have no selective ef...
متن کاملIn Silico Prediction of Evolutionarily Conserved GC-Rich Elements Associated with Antigenic Proteins of Plasmodium falciparum
The Plasmodium falciparum genome being AT-rich, the presence of GC-rich regions suggests functional significance. Evolution imposes selection pressure to retain functionally important coding and regulatory elements. Hence searching for evolutionarily conserved GC-rich, intergenic regions in an AT-rich genome will help in discovering new coding regions and regulatory elements. We have used eleva...
متن کاملRole of NRF-1 in bidirectional transcription of the human GPAT-AIRC purine biosynthesis locus.
GPAT and AIRC encode enzymes for steps one and six plus seven respectively in the pathway for de novo purine nucleotide synthesis in vertebrates. The human GPAT and AIRC genes are divergently transcribed from a 558 bp intergenic promoter region. Cis-acting sites and transcription factors important for bidirectional expression were identified. A cluster of sites between nt 215 and 260 are essent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017